Performing regular backups of your server cluster is imperative for high availability. This topic explains how you can use the Backup or Recovery Wizard to back up cluster nodes, describes ten cluster failure scenarios, and offers data restore solutions for each scenario using the Backup or Restore Wizard and recovery utilities from the Microsoft Windows Server 2003 Resource Kit.
For more information on backup and restore procedures, see Backing up and restoring data.
In a server cluster, there are four groups of data critical to the proper operation of the cluster; the disk signatures and partitions of the cluster disks, the cluster quorum data, the data on the cluster disks, and the data on the individual cluster nodes.
Cluster disk signatures and partitions
Before you begin to back up any data on the server cluster nodes, make sure you backup the cluster disk signatures and partitions using Automated System Recovery in the Backup Wizard. This step is necessary if you later need to restore the signature of the quorum disk, for example, if you experience a complete system failure, and the signature of the quorum disk has changed since you last backed up.
Note
For information, see To back up cluster disk signatures and partition layouts.
When you back up data on a server cluster node, make sure you also back up the cluster quorum. The cluster quorum is important because it contains the current cluster configuration, application registry checkpoints, and the cluster recovery log.
You can use the Backup Wizard to back up the cluster quorum data if you perform a System State backup from any node provided the Cluster service is running on that node.
For information, see To back up the cluster quorum.
To back up all cluster disks owned by a node, perform a full backup from that node.
You can also back up this data through a network connection to a hidden administrative file share. For example, you might use the New Resource Wizard to create FBackup$, GBackup$, and HBackup$ file shares for the root of drives F, G, and H, respectively. These shares would not appear in the browse list and could be configured to allow access only to members of the Backup Operators group.
For information on backing up data on the cluster disks, see To back up data on cluster nodes.
Important
Data on the individual cluster nodes
After you back up the cluster quorum disk on one node, it is not necessary to back up the quorum on the remaining cluster nodes. However, you may want to back up the clustering software, cluster administrative software, system state, and application data on the remaining nodes.
Important
For information on backing up data on individual cluster nodes, see To back up data on cluster nodes.
This section describes ten failure scenarios that will require restoring your cluster. The type of failure you experience determines the steps you must follow.
Scenario 1—Cluster Disk Data Loss
If you have lost files and folders on one of your cluster disks, but not on the disk containing the cluster quorum, you can use the Backup or Restore Wizard to restore that data.
Important
For information, see To restore files from a file or a tape.
Scenario 2—Cluster Quorum Corruption
Symptom: The cluster nodes can boot up, but the Cluster service fails to start because the quorum resource cannot come online.
If this problem results from corrupted files on the quorum disk, follow the steps outlined in To recover from a corrupted quorum log or quorum disk. If the cluster quorum disk needs to be replaced, see Scenario 5, below. For a majority node set cluster, see Scenario 9, below.
Scenario 3—Cluster Quorum Loses Checkpoints
Symptom: Some resources fail to come online and the configuration data is out of date.
If you have recovered from quorum corruption by creating a new quorum log as described in Scenario 2 above, you may need to restore the matching checkpoints before the quorum resource can come back online.
Scenario 4—Cluster Disk Corruption or Failure
Symptom: The cluster disk cannot come online. Resources that depend on that cluster disk may also not be able to come online.
First, run the chkntfs command to determine if the disk is merely corrupted. For information, see Chkntfs command.
If the cluster disk is corrupted or the disk hardware fails, you can restore the disk while still keeping your cluster up and running by using utilities in the Microsoft Windows Server 2003 Resource Kit. If you do not have access to these tools, you can still restore your cluster disk using the Backup and Recovery utilities included with
Notes
Scenario 5—Cluster Quorum Disk Failure
Symptom: The cluster nodes can boot up, but the Cluster service fails to start because the quorum resource cannot come online. Entries in the Event Log indicate hardware failures.
If the cluster quorum disk (the disk containing the quorum resource) fails, you can replace it while still keeping your cluster up and running by using utilities in the Microsoft Windows Server 2003 Resource Kit. If you do not have access to these tools, you can still replace your cluster quorum disk using the backup and restore utilities shipped with
Notes
Note
Scenario 6—Single Cluster Node Corruption or Failure
Symptom: The node cannot join the cluster.
If the Event Log indicates that the cluster database on the local node is merely corrupted, you can perform a System State restore on that node to replace the local cluster database. For information, see To restore the cluster database on a local node. Alternatively, you can copy the latest checkpoint file (CHKxxx.TMP) from the quorum disk to the
If a single node fails in the cluster due to system disk or other hardware failure, follow these steps to rebuild the node and rejoin the cluster:
Note
Scenario 7—Cluster Quorum Rollback
If your cluster is not functioning as expected, you can use the Backup or Restore Wizard to rollback your cluster to a previous configuration.
For information, see Restore the contents of a cluster quorum disk for all nodes in a cluster.
Scenario 8—Complete Cluster Failure
Symptom: None of the nodes can boot up.
If all nodes fail in a cluster and the quorum disk cannot be repaired, follow these steps:
Important
Scenario 9—Majority Node Set Cluster Failure
On a majority node set cluster, the cluster database is not stored on a cluster disk central to all nodes, but is instead stored locally on each node at
Using the full backup set for each node, restore data to each node in the cluster. For information, see To restore files from a file or a tape. The Cluster service will replicate the latest version of the cluster database to all other nodes.
If you want to restore an older version of the cluster database, stop the Cluster service on all nodes of the cluster and delete the local copies of the database (all the files under the \MSCS\ folder) on those nodes. Then restore the cluster database to one node and restart the Cluster service on all nodes. The Cluster service will replicate the restored version of the cluster database to all other nodes.
Note
In a majority node set cluster, if some of the nodes fail, and the cluster loses quorum, you can force the remaining nodes to form a quorum and restart the cluster. For more information, see To force quorum in a majority node set server cluster.
Scenario 10—Application Data Loss in a Server Cluster
When restoring application data in a server cluster, follow the instructions provided in the documentation that shipped with your application.
Important